Parallel Web Spiders for Cooperative Information Gathering

نویسندگان

  • Jiewen Luo
  • Zhongzhi Shi
  • Maoguang Wang
  • Wei Wang
چکیده

Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling process. This paper presents a parallel web spider model based on multi-agent system for cooperative information gathering. It uses the dynamic assignment mechanism to wipe off redundant web pages caused by parallelization. Experiments show that the parallel spider is effective to improve the information gathering performance within an acceptable interaction efficiency cost for controlling. This approach provides a novel perspective for the next generation advanced search engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification to Fish algorithm and Integration with Web Retrieval Systems

In this paper, we first look into the drawbacks of common technologies, in which Fish algorithm is most famous, used in selective web information gathering systems. For its inherent weakness, Fish algorithm will badly waste end-user time and energy to spider valuable information. We will present a new-kind system, TH-Gatherer system. It makes some important modifications in Fish algorithm, such...

متن کامل

An Adaptive, Ontology-Based Information Gathering Multi-Agent System for Restricted Web Domains

Due to Web size and diversity of information, relevant information gathering on the Web turns out to be a highly complex task. The main problem with most information retrieval approaches is neglecting pages’ context, given their inner deficiency: search engines are based on keyword indexing, which cannot capture context. Considering restricted domains, taking into account contexts, with the use...

متن کامل

A Cooperative Planning Algorithm to Improve Performance in Web Domains

In this paper, we present MAPWeb , a multiagent framework that integrates planning agents and Web information retrieval agents. The goal of this framework is to deal with problems that require planning with information to be gathered from the Web. Because of flexibility and efficiency reasons, MAPWeb decouples planning from information gathering, by splitting a planning problem into two parts: ...

متن کامل

AGATHE : une architecture générique à base d'agents et d'ontologies pour la collecte d'information sur domaines restreints du Web

Relevant information gathering in the Web is a very complex task. The main problem with most information retrieval approaches is neglecting the context of the pages, mainly because search engines are based on keyword-based indexing. Considering restrained domain, the taking account of this context is possible and has to leads to more relevant information gathering. In this paper, is proposed a ...

متن کامل

Collecte d'information sur domaines restreints du web à base d'agents et d'ontologies. Le système AGATHE

Relevant information gathering in the Web is a very complex task. The main problem with most information retrieval approaches is neglecting the context of the pages, mainly because search engines are based on keyword-based indexing. Considering restrained domain, it is possible to take into account of this context what should lead to more relevant information gathering. In this paper, a specifi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005